Direct AUC optimization of regulatory motifs
نویسندگان
چکیده
Motivation The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. Results We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. Availability and Implementation CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8 . Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.
منابع مشابه
Discriminative motif optimization based on perceptron training
MOTIVATION Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount...
متن کاملExploring Design Principles of Gene Regulatory Networks Via Pareto Optimality
One central problem in systems and synthetic biology is to characterize the biological functions of regulatory network motifs. Here we consider recent model-based exploration approaches used to identify motifs capable of performing a specific biological task. In this work, we propose an optimization based strategy where the motivation is twofold: on the one hand, to introduce efficiency and opt...
متن کاملBioequivalence Comparison of Two Formulations of Fixed-Dose Combination Glimepiride/Metformin (2/500 mg)Tablets in Healthy Volunteers
Glimepiride/metformin(2/500mg) is an oral antihyperglycemic agent for the treatment of type 2 diabetes. A generic glimepiride/metformin(2/500mg) fixed-dose combination(FDC) tablet was developed recently. This study was designed to collect data for submission to Korean regulatory authorities to allow the marketing of the test formulation. We evaluated the comparative bioavailability and tolerabi...
متن کاملIdentification of co-occurring transcription factor binding sites from DNA sequence using clustered position weight matrices
Accurate prediction of transcription factor binding sites (TFBSs) is a prerequisite for identifying cis-regulatory modules that underlie transcriptional regulatory circuits encoded in the genome. Here, we present a computational framework for detecting TFBSs, when multiple position weight matrices (PWMs) for a transcription factor are available. Grouping multiple PWMs of a transcription factor ...
متن کاملRegression trees for regulatory element identification
MOTIVATION The transcription of a gene is largely determined by short sequence motifs that serve as binding sites for transcription factors. Recent findings suggest direct relationships between the motifs and gene expression levels. In this work, we present a method for identifying regulatory motifs. Our method makes use of tree-based techniques for recovering the relationships between motifs a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 33 شماره
صفحات -
تاریخ انتشار 2017